Competitive Experience Replay (CER) is a strategy for goal-directed RL with sparse reward. In CER, a pair of agents, \(\pi _A \) and \(\pi _B\), are trained simultaneously.
Agent \(\pi _A\) is punished if agent \(\pi _B\) is also visited the near states, \(|s_A^i - s_B^j| < \delta \). Whereas, agent \(\pi _B\) gets reward if it visits near the agent \(\pi _A\). Such reward re-labelling is executed per mini-batch.
Depending on initializing methods, there are two variants of CER. Independent-CER initializes \(\pi _B\) with the task’s initial distribution. Interect-CER initializes \(\pi _B\) with random off-policy sample of \(\pi _A\).
The authors pointed out that CER complemented the HER. Together with HER, CER improves performance.
Because of flexible environment design, cpprb can store a pair of transitions from 2 agents. However, the current version of cpprb doesn’t have the functionality of sampling episodes instead of transitions.